Non Parametric Methods for Genomic Inference
نویسندگان
چکیده
Large-scale statistical analysis of data sets associated with genome sequences plays an important role in modern biology. A key component of such statistical analyses is the computation of p-values and confidence bounds for statistics defined on the genome. Currently such computation is commonly achieved through ad hoc simulation measures. The method of randomization, which is at the heart of these simulation procedures, can significantly affect the resulting statistical conclusions. Most simulation schemes introduce a variety of hidden assumptions regarding the nature of the randomness in the data, resulting in a failure to capture biologically meaningful relationships. To address the need for a method of assessing the significance of observations within large scale genomic studies, where there often exists a complex dependency structure between observations, we propose a unified solution built upon a data subsampling approach. We propose a piecewise stationary model for genome sequences and show that the subsampling approach gives correct answers under this model. We illustrate the method on three simulation studies and two real data examples.
منابع مشابه
Predictive Ability of Statistical Genomic Prediction Methods When Underlying Genetic Architecture of Trait Is Purely Additive
A simulation study was conducted to address the issue of how purely additive (simple) genetic architecture might impact on the efficacy of parametric and non-parametric genomic prediction methods. For this purpose, we simulated a trait with narrow sense heritability h2= 0.3, with only additive genetic effects for 300 loci in order to compare the predictive ability of 14 more practically used ge...
متن کاملصحت انتخاب ژنومی روشهای پارامتری و ناپارامتری با معماریهای ژنتیکی افزایشی و غالبیت
In most genomic prediction studies only additive effects will be used in models for estimating genomic breeding values (GEBV). However, dominance genetic effects are an important source of variation for complex traits, considering them into account may improve the accuracy of GEBV. In the present study, performed applying simulated data, the effect of different heritability values (0.1...
متن کاملاستنباط پیشگو ناپارامتری فازی بهینه برای طرح نمونهگیری جهت پذیرش یک مرحلهای
Acceptance sampling is one of the main parts of the statistical quality control. It is primarily used for the inspection of incoming or outgoing lots. Acceptance sampling procedures can be used in an acceptance control program to reach better quality with lower expenses, improvement of the control and the increase of efficiency. The aims of this paper, studying acceptance sampling based on non-...
متن کاملBayesian Nonparametric and Parametric Inference
This paper reviews Bayesian Nonparametric methods and discusses how parametric predictive densities can be constructed using nonparametric ideas.
متن کاملAssessing significance [JF 20]
With the exception of Bayesian analysis, phylogenetic inference procedures typically identify a best estimate of phylogenetic relationships, a so called point estimate of the phylogeny. However, the point estimate is often relatively uninteresting in itself unless we have some measure of its reliability. This lecture will be about techniques for examining the robustness or significance of the r...
متن کاملتنظیم و کاربرد الگوریتم جنگل تصادفی در ارزیابی ژنومی
One of the most important issues in genomic selection is using a decent method for estimating marker effects and genomic evaluation. Recently, machine learning algorithms which are members of non-parametric and non-linear methods have been extended to genomic evaluation. One of these methods is Random Forest (RF) on which this research was focused. Important parameters in RF algorithm are the n...
متن کامل